AEGD: adaptive gradient descent with energy

نویسندگان

چکیده

We propose AEGD, a new algorithm for optimization of non-convex objective functions, based on dynamically updated 'energy' variable. The method is shown to be unconditionally energy stable, irrespective the base step size. prove energy-dependent convergence rates AEGD both and convex objectives, which suitably small size recovers desired batch gradient descent. also provide an bound stationary in stochastic setting. straightforward implement requires little tuning hyper-parameters. Experimental results demonstrate that works well large variety problems. Specifically, it robust with respect initial data, capable making rapid progress. shows comparable often better generalization performance than SGD momentum deep neural networks. code available at https://github.com/txping/AEGD.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Online Gradient Descent

We study the rates of growth of the regret in online convex optimization. First, we show that a simple extension of the algorithm of Hazan et al eliminates the need for a priori knowledge of the lower bound on the second derivatives of the observed functions. We then provide an algorithm, Adaptive Online Gradient Descent, which interpolates between the results of Zinkevich for linear functions ...

متن کامل

Adaptive Variance Reducing for Stochastic Gradient Descent

Variance Reducing (VR) stochastic methods are fast-converging alternatives to the classical Stochastic Gradient Descent (SGD) for solving large-scale regularized finite sum problems, especially when a highly accurate solution is required. One critical step in VR is the function sampling. State-of-the-art VR algorithms such as SVRG and SAGA, employ either Uniform Probability (UP) or Importance P...

متن کامل

Adaptive wavefront control with asynchronous stochastic parallel gradient descent clusters.

A scalable adaptive optics (AO) control system architecture composed of asynchronous control clusters based on the stochastic parallel gradient descent (SPGD) optimization technique is discussed. It is shown that subdivision of the control channels into asynchronous SPGD clusters improves the AO system performance by better utilizing individual and/or group characteristics of adaptive system co...

متن کامل

Stochastic Gradient Descent with GPGPU

We show how to optimize a Support Vector Machine and a predictor for Collaborative Filtering with Stochastic Gradient Descent on the GPU, achieving 1.66 to 6-times accelerations compared to a CPUbased implementation. The reference implementations are the Support Vector Machine by Bottou and the BRISMF predictor from the Netflix Prices winning team. Our main idea is to create a hash function of ...

متن کامل

Learning to learn by gradient descent by gradient descent

The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorit...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Numerical Algebra, Control and Optimization

سال: 2023

ISSN: ['2155-3297', '2155-3289']

DOI: https://doi.org/10.3934/naco.2023015